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1. INTRODUCTION 

The main text in the era of rapid technological progress, the fusion of artificial intelligence (AD, and 
robotics is reshaping various aspects of our lives. One area where this convergence holds enormous promise 
is education. This research is an integral part of a comprehensive initiative aimed at formulating and 
improving the pioneering teacher robot, which is an advanced combination of AI and image processing 
technologies. The primary focus of our research is to harness the power of AI to provide robot teachers with 
emotional intelligence (EI), which is of great importance in the teaching process [1]. Therefore, our robot has 
to classify a student's emotions by analyzing his facial expressions. 

Understanding the critical role of EI in education unveils a profound connection between a teacher's 
ability to convey information and a student's capacity to comprehend and engage with it. The dynamics of 
learning extend beyond the mere transmission of facts; they are deeply intertwined with the emotional 
landscape within the classroom [2], [3]. Exploring the significance of EI in educators offers a profound 
insight into how the mastery of emotions shapes the learning environment, influences student-teacher 
interactions, and ultimately impacts the absorption and retention of knowledge [4]. 

In this work, we have created a system which make the teacher robot has the emotional intelligence. 
We tried using convolutional neural network (CNN) with pooling layers, but the results were not satisfactory 
enough. Therefore, we conducted another experiment by replacing the pooling layers with a learning focal 
point (LFP) layer, and the results were much better than the first experiment by comparing accuracy and 
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other parameters. Using the LFP layer instead of layer pooling is exactly our scientific contribution in this 
research and what we will discuss in the rest of this article. 

By researching the methods used in AI to classify emotions through facial expressions, we found 
that most of them rely on deep learning through training neural networks. This means that there are only 
innovations in deep learning architecture. One of the most used architectural structures in facial recognition is 
the CNN, because we find many solutions and frameworks such as DeepFace, FaceNet, LightCNN, and 
others that rely on CNN which in turn relays on pooling layers [5]. Pooling helps to manage computational 
resources, reduce the number of parameters, and focus on the most important features in the data. CNNs have 
proven highly effective in various computer vision tasks due to their ability to automatically learn 
hierarchical representations of features from input data [6]. CNNs derive their name from the convolutional 
layers, which apply filters or kernels to the input data. These filters slide over the input, capturing local 
patterns and features [7]. The result is a feature map that highlights relevant structures in the input. Among 
the core components used in CNN, pooling layers play a crucial role in reducing the spatial dimensionality of 
the input data, which helps control he network's computational complexity, and parameter count [8]. Pooling 
is a down-sampling process applied after convolutional layers to retain essential information while reducing 
spatial resolution. There are two common types of pooling layers: max pooling an average pooling [9]. Max 
pooling is a popular pooling technique that selects the maximum value from a group of neighboring pixels in 
the input. The idea is to retain the most important features and reduce the spatial dimensions. Average 
pooling computes the average value from a group of neighboring pixels. 

In this article, in section 2, we will explain the LFP algorithm and how we can use it for emotion 
classification. In section 3, we will discuss the results and performance of using the LFP algorithm in this 
case. Followed by the conclusion. we will present the design and mathematical theories on which the LFP 
algorithm is based. 


2. METHOD 
2.1. Emotion classification system 
Our system that classifies emotions based on facial expressions; it generally consists of three main 

modules as we see in Figure 1. 

— The first module: this unit relies on the open source computer vision (OpenCV) library, so that the 
student's face is identified and traced back through video or image analysis. 

— The second module: it is based on the LFP algorithm to extract the key regions of the face by returning 
the coordinates of the key squares of the face. 

— The third module: its role is to calculate the weights of the neural network to classify facial expressions. 
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Figure 1. Emotion classification system modules 


2.2. Dataset 

The dataset is used in “representation learning challenges: facial expression recognition (FER) 
challenge” on Kaggle. Its main task is to classify facial expressions into different emotion categories. 
Typically, the dataset consists of images labeled with one of several emotion classes, such as happiness, 
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sadness, anger, surprise, fear, disgust, and neutral. We used this dataset to train the neural network. The 
specific database used in the Kaggle FER Challenge might be the FER 2013 (FER2013) dataset, which 
consists of 48x48-pixel grayscale images of faces. Each image is labeled with one of seven emotions. It 
contains around 35,000 images, categorized into seven different emotions. 


2.3. OpenCV 

OpenCV library serves as a comprehensive toolbox for performing various image processing tasks, 
including image filtering, transformation, enhancement, and geometric operations like resizing, cropping, and 
rotation [10], [11]. It provides a wide range of tools and functionalities for image and video processing, 
including features for computer vision, machine learning, and image analysis [12]. OpenCV is written in C++ 
and also has interfaces for Python, Java, and other languages. It provides a comprehensive set of functions for 
basic to advanced image processing tasks, including image filtering, morphological operations, and color 
space transformations. OpenCV includes a variety of computer vision algorithms for object detection, feature 
extraction, image stitching, and camera calibration. It integrates with machine learning frameworks and 
includes machine learning algorithms for tasks such as classification, clustering, and regression. Also, 
OpenCV supports deep learning frameworks like TensorFlow and PyTorch, allowing users to work with pre- 
trained deep learning models and build their own models [13]. In our case, we used OpenCV to detect and 
extract images of children’s faces, as we see in Figure 2, from the cameras that will be installed. 


Figure 2. Face extraction and detection using OpenCV 


2.4. Learning focal point algorithm 

The LFP algorithm helps reduce the number of parameters and focus on the most important features 
in the data. Through the flowchart in Figure 3, we can understand how the LFP algorithm works. We take a 
set of images of the same size and then divide each image into squares, as we see in Figure 4. We perform 
perceptron training on each square, then we calculate their accuracy through the (1), and finally get the 
coordinates (x, y, width and height) of the high-precision squares. To sum up, LFP algorithm relies on the 
accuracy of perceptron training on different squares of images to return their coordinates. This means that it 
has the same role as max pooling and mean pooling used in CNN. As a scientific contribution, we replace the 
pooling layers with an LFP layer. 

The LFP algorithm employs a systematic approach to identify key focal points within input images. 
Beginning with image segmentation into distinct parts (Div[i]), the algorithm utilizes perceptrons to extract 
relevant features. Accuracy calculations evaluate focal point identification effectiveness, followed by sorting 
Div[i] segments based on precision. The algorithm returns coordinates of the segments with the highest 
precision, signifying the localization of key focal points. This process distills image complexity, facilitating a 
focused analysis of essential features. 
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Figure 3. Flowchart of the LFP algorithm Figure 4. Execution of perceptron on several square 


2.5. TensorFlow 

TensorFlow is an open-source machine learning framework developed by the Google Brain team 
[14], [15]. It's designed to facilitate the development and deployment of machine learning models. 
TensorFlow has strong support for building and training CNN [16]. CNNs are a specific type of neural 
network architecture that is particularly effective for image-related tasks, such as image classification, object 
detection, and image segmentation [17]. TensorFlow provides a comprehensive ecosystem for developing 
and deploying machine learning models. It offers a high-level application programming interface (API) 
called Keras that simplifies the process of building and training neural networks, including CNNs [18]. 
TensorFlow allows users to define, train, and deploy complex models efficiently. Keras is an open-source 
high-level neural networks API that is now tightly integrated into TensorFlow [19]. With Keras, we can 
easily build and experiment with various neural network architectures, including CNNs, using a clear and 
user-friendly syntax. Therefore, we train the neural network by using the squares found by LFP algorithm as 
we see in Figure 5. 
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Figure 5. Training the neural network using the squares found by the LFP algorithm 


3. RESULTS AND DISCUSSION 

After learning about the methods used in this research, we will present the results obtained using the 
LFP algorithm and compare them with the results of max pooling. We used the FER challenge database on 
Kaggle. We conducted two experiments as we see in Figure 6 in order to obtain the models: the first we used 
the LFP algorithm in the CNN and the second we used max pooling. The performance of these models was 
evaluated based on classification accuracy (CA), precision, recall, Fl score, and receiver operating 
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characteristic-area under the curve (ROC-AUC) values [20]. These indices (these functions) were generated 
automatically using TensorFlow library. Accuracy in AI refers to how well a machine learning model 
performs in making correct predictions or classifications compared to the total number of predictions. It's a 
measure of how often the model is correct. For classification tasks, it measures the percentage of correctly 
predicted instances among all instances. While accuracy is an essential metric, it might not give a complete 
picture, especially in cases of imbalanced datasets. Precision in AI, particularly in classification tasks, 
measures the ratio of true positive predictions to the total predicted positives [21]. It focuses on the relevance 
of the model's predictions. Precision is about how many of the predicted positive instances are actually 
relevant. Recall in the context of AI and machine learning refers to the ability of a model to correctly identify 
all relevant instances or data points within a dataset. It's a measure of a model's completeness in capturing all 
the relevant information. It calculates the ratio of true positives to the sum of true positives and false 
negatives [22]. Fl score this metric considers both precision and recall to provide a single score that 
represents the model's performance. In such cases, the F1 score becomes valuable. It considers both false 
positives (precision) and false negatives (recall), providing a balanced assessment of the model's 
effectiveness. It's particularly beneficial when there's a need to avoid either missing positive cases (low 
recall) or misclassifying negative cases as positive (low precision) [23]. It's a useful metric when you want to 
balance between precision and recall and need a single value to assess a model's performance. The ROC- 
AUC is valuable because it evaluates a model's ability to discriminate between positive and negative classes 
across various threshold values [24], [25]. It's commonly used in binary classification problems and provides 
a robust assessment of the model's performance regardless of the threshold chosen for classification. A higher 
AUC generally suggests that the model is better at distinguishing between the classes. 

We did two experiments as we see in Figure 6. In the first experiment, we implemented the LFP 
algorithm with neural network and in the second experiment we just used max pooling. We not in Table 1 the 
results of these experiments which demonstrate the strength of the LFP algorithm, which presents hight 
precision. Furthermore, comparing experiments 1 and 2, we observe that we can increase the CA up to 10% 
if we use the LFP algorithm. 
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Figure 6. Training the neural network using the squares found by the LFP algorithm 
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Table 1. Results of two experiments 
Experiment N° Algorithm CA Precision Recall — Fl-Score —| ROC-AUC 
1 LFP algorithm 0.931 0.931 0.930 0.930 0.944 
2 Max pooling _ 0.826 0.825 0.824 0.824 0.835 


4. CONCLUSION 

In this work, we presented the strength of LFP algorithm to classify the emotions by ana lysing the 
facial expressions. And this we help to develop our robot teacher project by adding to it the emotional 
intelligence. We also present how we can change CNN architecture by replace max pooling and average 
pooling by LFP algorithm. In two experiments, we have compared two methods by using some estimation 
indices including AUC, CA, precision, F1 score, and recall. The numerical output result shows that the 
classifier based on the LFP algorithm performs better than its competitor in terms of accuracy (0.931), which 
means increased CA up to 10%. The LFP algorithm will open our eyes to a long road of scientific research 
because our algorithm is based on a single layer of sensory perception. However, we have had great results. 
So, if we use other machine learning algorithms that are better than perceptron. It is necessary to obtain 
increased CA. 
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